Hierarchical Modelling - Practical 1

Author

Lindsay F Banin

More tree allometry

As trees grow, they grow both horizontally in girth (and diameter) and vertically in height. But as trees age and gain size, the change (growth rate) in height declines; the relationship is non-linear (e.g. Banin et al. 2012). This is thought to happen because as trees gain height they become less limited in light, but more limited by other growth resources (water, nutrients…) and may put more energy into reproductive processes. There are remarkable similarities in the nature of these non-linear relationships (so called ‘rules of ecology’), yet they still vary across sites and across species. These grouping variables add ‘structure’ to our data i.e. these grouping create non-independence.

We will use the relationships between height and diameter within the UK tree dataset to explore the application of hierarchical models in rstanarm.

Explore the data

Load the “trees.csv” data we have already been using.

As usual, the best first step of an analysis is to get a feel for the data by plotting. We will first plot tree height against diameter (diameter at breast height; DBH), with points coloured according to the different species in the dataset. What do you observe about this plot?

Next, we will natural-log transform both the variables of tree_height and dbh. This transforms a non-linear Power law function y = aX^b to a linear function (log(y) = a + b*(logX)). What do you observe now?

df <- read.csv(url("https://raw.githubusercontent.com/NERC-CEH/beem_data/main/trees.csv"))
head(df)
  tree_number age species cultivation soil_type tree_height timber_height   dbh
1           1  44      SS           1         1       24.69         22.25 35.56
2           2  44      SS           1         1       21.64         20.12 26.67
3           3  44      SS           1         1       24.38         21.03 31.50
4           4  44      SS           1         1       23.47         18.90 21.84
5           5  23      SS           2        4b        7.62          6.40 21.08
6           6  23      SS           2        4b        8.84          5.33 12.95
  tree_mass stem_mass taper crown_mass root_mass
1    1325.2    1134.0 1.500      191.2     466.2
2     787.0     693.0 0.911       94.0     415.8
3    1055.2     893.2 1.237      162.0     327.6
4     675.0     617.4 0.629       57.6     194.4
5     185.4     130.5 2.000       54.9      65.2
6      77.4      47.7 0.772       29.7      22.9
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.4.3
# Plot the raw data
p <- ggplot(df, aes(dbh, tree_height, colour = species))
p <- p + geom_point()
p

# Now natural-log transform the height and diameter at breast height (DBH)
p <- ggplot(df, aes(log(dbh), log(tree_height), colour = species))
p <- p + geom_point()
p <- p + stat_smooth(method = "lm")
p
`geom_smooth()` using formula = 'y ~ x'

The natural-log transformation will not always be the best for a non-linear relationship (for example, if your relationship exhibits an asymptote), but this is good enough for now!

The package rstanarm uses the formulation from the lme4 package, which can be used for hierarchical models using a maximum likelihood approach. Linear mixed-effects models use the function ‘lmer’ and non-linear models use the ‘nlmer’ function in the lme4 package. Before we move to the Bayesian formulation in rstanarm, let’s explore a simple model using lme4. (If this package is not familiar, we could easily dedicate a whole training session to this! But we will focus on the essential elements for now).

Creating a model using lme4

Here we will model the relationship between tree height and diameter using lme4 using the function lmer. As above, we take the log of both variables so that the relationship is linear. Much of this code will look familiar, as compared with a linear regression (lm). Here, we have an additional term which specifies how the grouping variable ‘species’ will be used in the model. In the first model (m1_lme) below, we have specified that the intercept of the height-diameter relationship varies by the group term Species using (1|species). The second model formulation allows both the intercept and slope to vary by group (diam|species). In each case, the intercepts and slopes for each species are a deviations from the parameter estimates for the intercept and slope for the whole dataset (the fixed effects), and these represent a population from a random distribution.

## Assess Tree Height and Diameter Allometric relationship using mixed effects model with lme4 package
library(lme4)
Warning: package 'lme4' was built under R version 4.4.3
Loading required package: Matrix
height <- log(df$tree_height)
diam <- log(df$dbh)
m1_lme <- lmer(height ~ diam + (1|species), data = df, na.action = na.exclude)
summary(m1_lme)
Linear mixed model fit by REML ['lmerMod']
Formula: height ~ diam + (1 | species)
   Data: df

REML criterion at convergence: -1527.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9978 -0.6109  0.0439  0.6544  3.5501 

Random effects:
 Groups   Name        Variance Std.Dev.
 species  (Intercept) 0.008072 0.08984 
 Residual             0.025564 0.15989 
Number of obs: 1901, groups:  species, 13

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.44100    0.05137   8.584
diam         0.72257    0.01473  49.041

Correlation of Fixed Effects:
     (Intr)
diam -0.860
ranef(m1_lme)
$species
    (Intercept)
CP -0.065595914
DF  0.074487937
EL  0.062835193
GF  0.163073601
JL  0.037563658
LC -0.181808088
LP  0.003878361
NF  0.006101305
NS -0.044023529
RC -0.082740006
SP -0.037135707
SS  0.026345056
WH  0.037018135

with conditional variances for "species" 
m2_lme <- lmer(height ~ diam + (diam|species), data = df, na.action = na.exclude)
summary(m2_lme)
Linear mixed model fit by REML ['lmerMod']
Formula: height ~ diam + (diam | species)
   Data: df

REML criterion at convergence: -1551.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.0397 -0.6159  0.0329  0.6479  3.5857 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 species  (Intercept) 0.26186  0.5117        
          diam        0.02559  0.1600   -0.98
 Residual             0.02503  0.1582        
Number of obs: 1901, groups:  species, 13

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.63809    0.17749   3.595
diam         0.65920    0.05674  11.618

Correlation of Fixed Effects:
     (Intr)
diam -0.988
ranef(m2_lme)
$species
   (Intercept)         diam
CP  0.69731911 -0.249574016
DF  0.27465251 -0.064409200
EL -0.21312524  0.090547610
GF  0.79823959 -0.205792952
JL  0.17060288 -0.045197538
LC -0.54859640  0.120019866
LP -0.54348420  0.184344324
NF  0.08496274 -0.028042799
NS -0.16409454  0.037731355
RC -0.11114522  0.008136213
SP  0.02984401 -0.023580739
SS -0.16943106  0.062926980
WH -0.30574417  0.112890897

with conditional variances for "species" 

As we expected from the plotting, the relationships between tree_height and dbh are ‘shifted’ along the y-axis according to species. Let’s first proceed with our simpler model in rstanarm. Here, we run it using the function glmer (generalised linear mixed effects model) where we explicitly specify the family as Gaussian (Normal) - we could also run it using lmer in this case. We will return to more examples with generalised linear models later.

We can see that there are lots of similarities between the lmer models above and the stan_glmer model below, but we also add in some extra arguments to prescribe how our Bayesian analysis is run - notice the chains and iter (iterations) arguments. As these numbers get higher, the model takes longer to run, and you might want to check things are behaving before you set it off on lots of iterations. Let’s start small-ish. If these samples are inadequate and the chains are not converging you will get some useful warnings about effective sample size and R-hat (https://mc-stan.org/rstanarm/articles/rstanarm.html). If chains have not mixed well (so that between- and within-chain estimates do not agree), R-hat is larger than 1.

## Re-run using rstanarm
library(rstanarm)
Warning: package 'rstanarm' was built under R version 4.4.3
Loading required package: Rcpp
Warning: package 'Rcpp' was built under R version 4.4.3
This is rstanarm version 2.32.1
- See https://mc-stan.org/rstanarm/articles/priors for changes to default priors!
- Default priors may change, so it's safest to specify priors, even if equivalent to the defaults.
- For execution on a local, multicore CPU with excess RAM we recommend calling
  options(mc.cores = parallel::detectCores())
height <- log(df$tree_height)
diam <- log(df$dbh)
m1_stan <- stan_glmer(height ~ diam + (1|species), data = df, family = gaussian(link = "identity"), chains = 2,iter = 500, seed = 12345)

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.000406 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 4.06 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:   1 / 500 [  0%]  (Warmup)
Chain 1: Iteration:  50 / 500 [ 10%]  (Warmup)
Chain 1: Iteration: 100 / 500 [ 20%]  (Warmup)
Chain 1: Iteration: 150 / 500 [ 30%]  (Warmup)
Chain 1: Iteration: 200 / 500 [ 40%]  (Warmup)
Chain 1: Iteration: 250 / 500 [ 50%]  (Warmup)
Chain 1: Iteration: 251 / 500 [ 50%]  (Sampling)
Chain 1: Iteration: 300 / 500 [ 60%]  (Sampling)
Chain 1: Iteration: 350 / 500 [ 70%]  (Sampling)
Chain 1: Iteration: 400 / 500 [ 80%]  (Sampling)
Chain 1: Iteration: 450 / 500 [ 90%]  (Sampling)
Chain 1: Iteration: 500 / 500 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 5.183 seconds (Warm-up)
Chain 1:                3.028 seconds (Sampling)
Chain 1:                8.211 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 0.000322 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 3.22 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:   1 / 500 [  0%]  (Warmup)
Chain 2: Iteration:  50 / 500 [ 10%]  (Warmup)
Chain 2: Iteration: 100 / 500 [ 20%]  (Warmup)
Chain 2: Iteration: 150 / 500 [ 30%]  (Warmup)
Chain 2: Iteration: 200 / 500 [ 40%]  (Warmup)
Chain 2: Iteration: 250 / 500 [ 50%]  (Warmup)
Chain 2: Iteration: 251 / 500 [ 50%]  (Sampling)
Chain 2: Iteration: 300 / 500 [ 60%]  (Sampling)
Chain 2: Iteration: 350 / 500 [ 70%]  (Sampling)
Chain 2: Iteration: 400 / 500 [ 80%]  (Sampling)
Chain 2: Iteration: 450 / 500 [ 90%]  (Sampling)
Chain 2: Iteration: 500 / 500 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 10.664 seconds (Warm-up)
Chain 2:                1.042 seconds (Sampling)
Chain 2:                11.706 seconds (Total)
Chain 2: 
Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#bulk-ess
Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
https://mc-stan.org/misc/warnings.html#tail-ess
## try editing chains and iterations and look at warnings.
m1_stan <- stan_glmer(height ~ diam + (1|species), data = df, family = gaussian(link = "identity"), chains = 4,iter = 2500, seed = 12345)

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.000119 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.19 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2500 [  0%]  (Warmup)
Chain 1: Iteration:  250 / 2500 [ 10%]  (Warmup)
Chain 1: Iteration:  500 / 2500 [ 20%]  (Warmup)
Chain 1: Iteration:  750 / 2500 [ 30%]  (Warmup)
Chain 1: Iteration: 1000 / 2500 [ 40%]  (Warmup)
Chain 1: Iteration: 1250 / 2500 [ 50%]  (Warmup)
Chain 1: Iteration: 1251 / 2500 [ 50%]  (Sampling)
Chain 1: Iteration: 1500 / 2500 [ 60%]  (Sampling)
Chain 1: Iteration: 1750 / 2500 [ 70%]  (Sampling)
Chain 1: Iteration: 2000 / 2500 [ 80%]  (Sampling)
Chain 1: Iteration: 2250 / 2500 [ 90%]  (Sampling)
Chain 1: Iteration: 2500 / 2500 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 10.493 seconds (Warm-up)
Chain 1:                6.416 seconds (Sampling)
Chain 1:                16.909 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 0.000149 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.49 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2500 [  0%]  (Warmup)
Chain 2: Iteration:  250 / 2500 [ 10%]  (Warmup)
Chain 2: Iteration:  500 / 2500 [ 20%]  (Warmup)
Chain 2: Iteration:  750 / 2500 [ 30%]  (Warmup)
Chain 2: Iteration: 1000 / 2500 [ 40%]  (Warmup)
Chain 2: Iteration: 1250 / 2500 [ 50%]  (Warmup)
Chain 2: Iteration: 1251 / 2500 [ 50%]  (Sampling)
Chain 2: Iteration: 1500 / 2500 [ 60%]  (Sampling)
Chain 2: Iteration: 1750 / 2500 [ 70%]  (Sampling)
Chain 2: Iteration: 2000 / 2500 [ 80%]  (Sampling)
Chain 2: Iteration: 2250 / 2500 [ 90%]  (Sampling)
Chain 2: Iteration: 2500 / 2500 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 11.59 seconds (Warm-up)
Chain 2:                5.597 seconds (Sampling)
Chain 2:                17.187 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 0.00012 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 1.2 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2500 [  0%]  (Warmup)
Chain 3: Iteration:  250 / 2500 [ 10%]  (Warmup)
Chain 3: Iteration:  500 / 2500 [ 20%]  (Warmup)
Chain 3: Iteration:  750 / 2500 [ 30%]  (Warmup)
Chain 3: Iteration: 1000 / 2500 [ 40%]  (Warmup)
Chain 3: Iteration: 1250 / 2500 [ 50%]  (Warmup)
Chain 3: Iteration: 1251 / 2500 [ 50%]  (Sampling)
Chain 3: Iteration: 1500 / 2500 [ 60%]  (Sampling)
Chain 3: Iteration: 1750 / 2500 [ 70%]  (Sampling)
Chain 3: Iteration: 2000 / 2500 [ 80%]  (Sampling)
Chain 3: Iteration: 2250 / 2500 [ 90%]  (Sampling)
Chain 3: Iteration: 2500 / 2500 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 10.816 seconds (Warm-up)
Chain 3:                6.225 seconds (Sampling)
Chain 3:                17.041 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 0.000154 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 1.54 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2500 [  0%]  (Warmup)
Chain 4: Iteration:  250 / 2500 [ 10%]  (Warmup)
Chain 4: Iteration:  500 / 2500 [ 20%]  (Warmup)
Chain 4: Iteration:  750 / 2500 [ 30%]  (Warmup)
Chain 4: Iteration: 1000 / 2500 [ 40%]  (Warmup)
Chain 4: Iteration: 1250 / 2500 [ 50%]  (Warmup)
Chain 4: Iteration: 1251 / 2500 [ 50%]  (Sampling)
Chain 4: Iteration: 1500 / 2500 [ 60%]  (Sampling)
Chain 4: Iteration: 1750 / 2500 [ 70%]  (Sampling)
Chain 4: Iteration: 2000 / 2500 [ 80%]  (Sampling)
Chain 4: Iteration: 2250 / 2500 [ 90%]  (Sampling)
Chain 4: Iteration: 2500 / 2500 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 10.467 seconds (Warm-up)
Chain 4:                5.03 seconds (Sampling)
Chain 4:                15.497 seconds (Total)
Chain 4: 

In the code chunk above, we have not explicitly stated our priors, which means rstanarm has used the defaults. There is some really useful information about how these defaults are chosen in this vignette: https://mc-stan.org/rstanarm/articles/priors.html, and on the priors for the covariance matrix (https://mc-stan.org/rstanarm/articles/glmer.html#priors-on-covariance-matrices-1). They are intended to be weakly informative; vague but without allowing odd behaviour in the model. Let’s take a look at what these defaults are for our model.

prior_summary(m1_stan, digits = 2)
Priors for model 'm1_stan' 
------
Intercept (after predictors centered)
  Specified prior:
    ~ normal(location = 2.6, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 2.6, scale = 0.62)

Coefficients
  Specified prior:
    ~ normal(location = 0, scale = 2.5)
  Adjusted prior:
    ~ normal(location = 0, scale = 2.4)

Auxiliary (sigma)
  Specified prior:
    ~ exponential(rate = 1)
  Adjusted prior:
    ~ exponential(rate = 4)

Covariance
 ~ decov(reg. = 1, conc. = 1, shape = 1, scale = 1)
------
See help('prior_summary.stanreg') for more details

Notice we have the same priors as for the linear regression model - for the intercept, slope coefficient, auxiliary (sigma) and we also have the prior for the covariance matrix, which relates to our random effect term (species).

To explicitly state our priors, we could write the model formula as:

m1_stan_wpriors <- stan_glmer(height ~ diam + (1|species), data = df, family = gaussian(link = "identity"), prior_intercept = normal(location=2.6, scale=0.62), prior = normal(location=0, scale=2.4), prior_aux = exponential(rate=4), prior_covariance = decov(shape = 1, scale = 1), chains = 4,iter = 2500, seed = 12345)

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1: 
Chain 1: Gradient evaluation took 0.000136 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.36 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1: 
Chain 1: 
Chain 1: Iteration:    1 / 2500 [  0%]  (Warmup)
Chain 1: Iteration:  250 / 2500 [ 10%]  (Warmup)
Chain 1: Iteration:  500 / 2500 [ 20%]  (Warmup)
Chain 1: Iteration:  750 / 2500 [ 30%]  (Warmup)
Chain 1: Iteration: 1000 / 2500 [ 40%]  (Warmup)
Chain 1: Iteration: 1250 / 2500 [ 50%]  (Warmup)
Chain 1: Iteration: 1251 / 2500 [ 50%]  (Sampling)
Chain 1: Iteration: 1500 / 2500 [ 60%]  (Sampling)
Chain 1: Iteration: 1750 / 2500 [ 70%]  (Sampling)
Chain 1: Iteration: 2000 / 2500 [ 80%]  (Sampling)
Chain 1: Iteration: 2250 / 2500 [ 90%]  (Sampling)
Chain 1: Iteration: 2500 / 2500 [100%]  (Sampling)
Chain 1: 
Chain 1:  Elapsed Time: 10.33 seconds (Warm-up)
Chain 1:                7.882 seconds (Sampling)
Chain 1:                18.212 seconds (Total)
Chain 1: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2: 
Chain 2: Gradient evaluation took 0.000112 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 1.12 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2: 
Chain 2: 
Chain 2: Iteration:    1 / 2500 [  0%]  (Warmup)
Chain 2: Iteration:  250 / 2500 [ 10%]  (Warmup)
Chain 2: Iteration:  500 / 2500 [ 20%]  (Warmup)
Chain 2: Iteration:  750 / 2500 [ 30%]  (Warmup)
Chain 2: Iteration: 1000 / 2500 [ 40%]  (Warmup)
Chain 2: Iteration: 1250 / 2500 [ 50%]  (Warmup)
Chain 2: Iteration: 1251 / 2500 [ 50%]  (Sampling)
Chain 2: Iteration: 1500 / 2500 [ 60%]  (Sampling)
Chain 2: Iteration: 1750 / 2500 [ 70%]  (Sampling)
Chain 2: Iteration: 2000 / 2500 [ 80%]  (Sampling)
Chain 2: Iteration: 2250 / 2500 [ 90%]  (Sampling)
Chain 2: Iteration: 2500 / 2500 [100%]  (Sampling)
Chain 2: 
Chain 2:  Elapsed Time: 10.319 seconds (Warm-up)
Chain 2:                5.013 seconds (Sampling)
Chain 2:                15.332 seconds (Total)
Chain 2: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3: 
Chain 3: Gradient evaluation took 0.000129 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 1.29 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3: 
Chain 3: 
Chain 3: Iteration:    1 / 2500 [  0%]  (Warmup)
Chain 3: Iteration:  250 / 2500 [ 10%]  (Warmup)
Chain 3: Iteration:  500 / 2500 [ 20%]  (Warmup)
Chain 3: Iteration:  750 / 2500 [ 30%]  (Warmup)
Chain 3: Iteration: 1000 / 2500 [ 40%]  (Warmup)
Chain 3: Iteration: 1250 / 2500 [ 50%]  (Warmup)
Chain 3: Iteration: 1251 / 2500 [ 50%]  (Sampling)
Chain 3: Iteration: 1500 / 2500 [ 60%]  (Sampling)
Chain 3: Iteration: 1750 / 2500 [ 70%]  (Sampling)
Chain 3: Iteration: 2000 / 2500 [ 80%]  (Sampling)
Chain 3: Iteration: 2250 / 2500 [ 90%]  (Sampling)
Chain 3: Iteration: 2500 / 2500 [100%]  (Sampling)
Chain 3: 
Chain 3:  Elapsed Time: 13.242 seconds (Warm-up)
Chain 3:                7.928 seconds (Sampling)
Chain 3:                21.17 seconds (Total)
Chain 3: 

SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4: 
Chain 4: Gradient evaluation took 0.00032 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 3.2 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4: 
Chain 4: 
Chain 4: Iteration:    1 / 2500 [  0%]  (Warmup)
Chain 4: Iteration:  250 / 2500 [ 10%]  (Warmup)
Chain 4: Iteration:  500 / 2500 [ 20%]  (Warmup)
Chain 4: Iteration:  750 / 2500 [ 30%]  (Warmup)
Chain 4: Iteration: 1000 / 2500 [ 40%]  (Warmup)
Chain 4: Iteration: 1250 / 2500 [ 50%]  (Warmup)
Chain 4: Iteration: 1251 / 2500 [ 50%]  (Sampling)
Chain 4: Iteration: 1500 / 2500 [ 60%]  (Sampling)
Chain 4: Iteration: 1750 / 2500 [ 70%]  (Sampling)
Chain 4: Iteration: 2000 / 2500 [ 80%]  (Sampling)
Chain 4: Iteration: 2250 / 2500 [ 90%]  (Sampling)
Chain 4: Iteration: 2500 / 2500 [100%]  (Sampling)
Chain 4: 
Chain 4:  Elapsed Time: 14.737 seconds (Warm-up)
Chain 4:                7.243 seconds (Sampling)
Chain 4:                21.98 seconds (Total)
Chain 4: 

Now let’s explore and interpret our model.

The Bayesian uncertainty interval indicates we believe after seeing the data that there is 0.950�probability that�the slope parameter for log(DBH) (or diam)�is between�the lower bound ci95[1,1]�and upper bound�ci95[1,2] - a convenient interpretation for the parameters of interest as we heard on day one!

print(m1_stan) # for further information on interpretation: ?print.stanreg
stan_glmer
 family:       gaussian [identity]
 formula:      height ~ diam + (1 | species)
 observations: 1901
------
            Median MAD_SD
(Intercept) 0.4    0.1   
diam        0.7    0.0   

Auxiliary parameter(s):
      Median MAD_SD
sigma 0.2    0.0   

Error terms:
 Groups   Name        Std.Dev.
 species  (Intercept) 0.10    
 Residual             0.16    
Num. levels: species 13 

------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg
summary(m1_stan, digits = 5)

Model Info:
 function:     stan_glmer
 family:       gaussian [identity]
 formula:      height ~ diam + (1 | species)
 algorithm:    sampling
 sample:       5000 (posterior sample size)
 priors:       see help('prior_summary')
 observations: 1901
 groups:       species (13)

Estimates:
                                         mean     sd       10%      50%   
(Intercept)                             0.44250  0.05306  0.37477  0.44196
diam                                    0.72217  0.01486  0.70301  0.72206
b[(Intercept) species:CP]              -0.06554  0.03369 -0.10657 -0.06614
b[(Intercept) species:DF]               0.07402  0.03727  0.02746  0.07336
b[(Intercept) species:EL]               0.06269  0.04109  0.01149  0.06218
b[(Intercept) species:GF]               0.16235  0.03755  0.11626  0.16078
b[(Intercept) species:JL]               0.03757  0.03700 -0.00868  0.03691
b[(Intercept) species:LC]              -0.18488  0.06429 -0.26942 -0.18217
b[(Intercept) species:LP]               0.00356  0.03127 -0.03522  0.00325
b[(Intercept) species:NF]               0.00552  0.04475 -0.05024  0.00415
b[(Intercept) species:NS]              -0.04435  0.03201 -0.08351 -0.04466
b[(Intercept) species:RC]              -0.08425  0.05446 -0.15454 -0.08265
b[(Intercept) species:SP]              -0.03720  0.03210 -0.07659 -0.03782
b[(Intercept) species:SS]               0.02599  0.03001 -0.00946  0.02519
b[(Intercept) species:WH]               0.03680  0.03645 -0.00901  0.03619
sigma                                   0.16001  0.00256  0.15670  0.16001
Sigma[species:(Intercept),(Intercept)]  0.01022  0.00575  0.00470  0.00877
                                         90%   
(Intercept)                             0.51057
diam                                    0.74133
b[(Intercept) species:CP]              -0.02242
b[(Intercept) species:DF]               0.12207
b[(Intercept) species:EL]               0.11497
b[(Intercept) species:GF]               0.21122
b[(Intercept) species:JL]               0.08527
b[(Intercept) species:LC]              -0.10383
b[(Intercept) species:LP]               0.04316
b[(Intercept) species:NF]               0.06339
b[(Intercept) species:NS]              -0.00417
b[(Intercept) species:RC]              -0.01666
b[(Intercept) species:SP]               0.00332
b[(Intercept) species:SS]               0.06434
b[(Intercept) species:WH]               0.08275
sigma                                   0.16333
Sigma[species:(Intercept),(Intercept)]  0.01763

Fit Diagnostics:
           mean    sd      10%     50%     90%  
mean_PPD 2.62401 0.00522 2.61740 2.62394 2.63075

The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).

MCMC diagnostics
                                       mcse    Rhat    n_eff
(Intercept)                            0.00108 1.00012 2405 
diam                                   0.00021 0.99950 4908 
b[(Intercept) species:CP]              0.00099 1.00175 1168 
b[(Intercept) species:DF]              0.00099 1.00101 1407 
b[(Intercept) species:EL]              0.00097 1.00058 1776 
b[(Intercept) species:GF]              0.00098 1.00188 1480 
b[(Intercept) species:JL]              0.00099 1.00203 1386 
b[(Intercept) species:LC]              0.00126 0.99982 2622 
b[(Intercept) species:LP]              0.00097 1.00275 1031 
b[(Intercept) species:NF]              0.00103 1.00105 1879 
b[(Intercept) species:NS]              0.00098 1.00196 1075 
b[(Intercept) species:RC]              0.00103 0.99974 2820 
b[(Intercept) species:SP]              0.00097 1.00249 1090 
b[(Intercept) species:SS]              0.00096 1.00258  971 
b[(Intercept) species:WH]              0.00099 1.00221 1347 
sigma                                  0.00004 0.99983 4424 
Sigma[species:(Intercept),(Intercept)] 0.00016 1.00274 1337 
mean_PPD                               0.00007 0.99977 5067 
log-posterior                          0.11460 1.00125 1182 

For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
ci95 <- posterior_interval(m1_stan, prob = 0.95)
round(ci95, 2)
                                        2.5% 97.5%
(Intercept)                             0.34  0.55
diam                                    0.69  0.75
b[(Intercept) species:CP]              -0.13  0.00
b[(Intercept) species:DF]               0.00  0.15
b[(Intercept) species:EL]              -0.02  0.14
b[(Intercept) species:GF]               0.09  0.24
b[(Intercept) species:JL]              -0.03  0.11
b[(Intercept) species:LC]              -0.32 -0.07
b[(Intercept) species:LP]              -0.05  0.07
b[(Intercept) species:NF]              -0.08  0.10
b[(Intercept) species:NS]              -0.11  0.02
b[(Intercept) species:RC]              -0.19  0.02
b[(Intercept) species:SP]              -0.10  0.03
b[(Intercept) species:SS]              -0.03  0.09
b[(Intercept) species:WH]              -0.03  0.11
sigma                                   0.16  0.17
Sigma[species:(Intercept),(Intercept)]  0.00  0.03
# For specific parameters:
ci95 <- posterior_interval(m1_stan, prob = 0.95, pars = "diam")
round(ci95, 2)
     2.5% 97.5%
diam 0.69  0.75

What do you notice about the posterior medians in comparison to the lmer model results?

See documentation (Section 3) here for some more documentation on interpreting the model; https://mc-stan.org/users/documentation/case-studies/tutorial_rstanarm.html#prior-distributions.

Now let’s take a look at how our modelling process has performed - are our chains behaving, and is their any odd behaviour in our histograms of the MCMC samples? Notice we have a plot for every parameter and grouping level.

For more plotting options, look at the documentation here: https://mc-stan.org/rstanarm/reference/plot.stanreg.html

plot(m1_stan, "trace") # nice hairy catepillars!

plot(m1_stan, "hist")
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot(m1_stan, "intervals")

bayesplot::mcmc_areas_ridges(m1_stan, regex_pars = "species", prob = 0.95)

launch_shinystan(m1_stan, ppd = FALSE)

Now let’s compare our predictions from the model to our observations using some plotting options:

y_rep <- posterior_predict(m1_stan) # ?posterior_predict
dim(y_rep)
[1] 5000 1901
pp_check(m1_stan, plotfun = "boxplot", notch = FALSE) #?pp_check

pp_check(m1_stan, plotfun = "hist", nreps = 3)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

pp_check(m1_stan)

Now let’s apply the leave one out cross-validation using the loo package - are there any problematic influential points?:

library(loo)
Warning: package 'loo' was built under R version 4.4.3
This is loo version 2.8.0
- Online documentation and vignettes at mc-stan.org/loo
- As of v2.0.0 loo defaults to 1 core but we recommend using as many as possible. Use the 'cores' argument or set options(mc.cores = NUM_CORES) for an entire session. 
- Windows 10 users: loo may be very slow if 'mc.cores' is set in your .Rprofile file (see https://github.com/stan-dev/loo/issues/94).
loo_m1_stan <- loo(m1_stan)
plot(loo_m1_stan, label_points = TRUE)

Additional resources

https://www.bayesrulesbook.com/chapter-15